Examining Machine Learning for Adaptable End-to-End Information Extraction Systems

نویسندگان

  • Oren Glickman
  • Rosie Jones
چکیده

All components of a typical IE system have been the object of some machine learning research, motivated by the need to improve time taken to transfer to new domains. In this paper we survey such methods and assess to what extent they can help create a complete IE system that can be easily adapted to new domains. We also lay out a general prescription for an IE system in a new domain, employing existing components and technologies where possible. The goal is a system that can be adapted to a new domain with minimal human intervention (say by someone who may be a domain expert but need not be a computational linguist). We propose research directions for automating the process further, reducing the need for hand-tagged training data by relying on biases intrinsic to the information extraction task, and employing boot-strapping and active learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Temporal Information from Open Domain Text: A Comparative Exploration

The utility of data-driven techniques in the end-to-end problem of temporal information extraction is unclear. Recognition of temporal expressions yields readily to machine learning, but normalization seems to call for a rule-based approach. We explore two aspects of the (potential) utility of data-driven methods in the temporal information extraction task. First, we look at whether improving r...

متن کامل

Transparent Machine Learning for Information Extraction: State-of-the-Art and the Future

The rise of Big Data analytics over unstructured text has led to renewed interest in information extraction (IE). These applications need effective IE as a first step towards solving end-to-end real world problems (e.g. biology, medicine, finance, media and entertainment, etc). Much recent NLP research has focused on addressing specific IE problems using a pipeline of multiple machine learning ...

متن کامل

Machine learning based Visual Evoked Potential (VEP) Signals Recognition

Introduction: Visual evoked potentials contain certain diagnostic information which have proved to be of importance in the visual systems functional integrity. Due to substantial decrease of amplitude in extra macular stimulation in commonly used pattern VEPs, differentiating normal and abnormal signals can prove to be quite an obstacle. Due to developments of use of machine l...

متن کامل

End-to-End Relation Extraction via Syntactic Structures and Semantic Resources

Information Extraction (IE) aims at mapping texts into fixed structure representing the key information. A typical IE system will try to answer the questions like who are present in the text, what events happen and when these events happen. The task is making possible significant advances in applications that require deep understanding capabilities such as questionanswering engines, dialogue sy...

متن کامل

A Multi-Level Boundary Classification Approach to Information Extraction

Information Extraction (IE) is the process of identifying a set of pre-defined relevant items in text documents. We investigate the application of Machine Learning classification techniques to the problem of Information Extraction. In particular we use Support Vector Machines and several different feature-sets to build a set of classifiers for Information Extraction (IE). We show that this appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999